Open Source

1 Introduction

The Surge Capacity Assessment Tool (SCAT) is used to assess a laboratory’s readiness for surge capacity in an emergency situation. The SCAT consists of an Excel document in which the laboratory data is recorded. In case multiple Excel files need to be analysed and compared, it can be a challenging task to extract all the data from the list of SCAT Excel files.
This document describes how multiple Excel files can be the imported at once and subsequently compared and analysed.

Fig. 1: Data flow.

The code is written in R which is widely used by researchers and data scientists. The code allows the user to combine multiple Excel files, clean up and aggregate the date with one click. Even more, the data can be automatically analysed and exported to an easy to read PDF or HTML report file.

This document explains how to install R on your computer, how to install the R interface RStudio, and how to run a piece of code for the SCAT.

2 Install R for the 1st time

The R-statistical package is available at the CRAN website. On this website, do the following

  • Click ‘Download R for Windows’.

  • Then go to ‘install R for the first time’ in the base subdirectory.

The video below shows how to download and install R and RStudio.

Click to watch on Vimeo

3 Install the RStudio interface

The R programming language comes with a console that can read and execute the code. A screenshot of the R console is seen in figure 2.

Fig. 2: Standard R console.

The console is very basic, whereas other interfaces can support R with improved functionality and user friendliness.

The most commonly used interface is RStudio, with a screenshot in figure 3.

Fig. 3: RStudio console.

The RStudio Desktop version can be downloaded here for free. For more information and a video on installing (and updating) R and RStudio, you can visit this website.

4 Run code from RStudio

4.1 RStudio basics

When you open RStudio and run a code, you will see a screen something like the screenshot below. The interface has a 4-pane layout, and each pane (can) have multiple tabs, as shown below. The panes can be arranged as one finds it convenient.

Fig. 4: RStudio lay-out

The Console is where you can type code that executes immediately. This is also known as the command line. It is at the bottom left of your screen.

Generally, we will want to write programs longer than a few lines. The Source Editor can help you open, edit and execute these programs. It is the pane on the top left of your screen.

The Environment pane is very useful as it shows you what objects (i.e., dataframes, arrays, values and functions) you have in your environment (workspace). You can see the values for objects with a single value and for those that are longer R will tell you their class. When you have data in your environment that have two dimensions (rows and columns) you may click on them and they will appear in the Source Editor pane like a spreadsheet, as show in figure 4.

Fig. 5: Spreadsheet in Source editor pane.

The panes can have a number of different tabs. The Files tab has a navigable file manager, just like the file system on your operating system. The Plots tab is where graphics you create will appear. The Packages tab shows you the packages that are installed and those that can be installed (more on this later).

Fig. 6: Pane with Packages tab opened.

The Help tab allows you to search the R documentation for help and is where the help appears when you ask for it from the Console. It is at the bottom right of your RStudio window.

In the tab next to the Console is the Terminal.

Fig. 7: System shell access in the Terminal.

The terminal provides system shell access from within the ‘integrated development environment’ (IDE). You can access your system terminal (or command prompt/power shell in case of windows) without minimizing windows or leaving the workspace.

Panes can also be adjusted under the ‘Preferences’ menu. Note that there might be subtle differences between RStudio installations on different operating systems.

Fig. 8: Adjust panes in Preferences.

4.2 Install library’s

In R, a package is a collection of R functions, data and compiled code. The location where the packages are stored is called the library. If there is a particular functionality that you require, you can download the package from the appropriate site and it will be stored in your library. To actually use the package use the command “library(package)” which makes that package available to you.

  • Package: a collection of R functions, data, and compiled code in a well-defined format.
  • Library: the directory where packages are installed. Libraries with packages can be installed in two ways.

Typing in the console the command, for example:

  1. In the right down pane click the tab Packages and then Install. A new window pops-up in which you can type the package you want to install.

Fig. 9: Install package in console

  1. In the right down pane click the tab Packages and then Install. A new window pops-up in which you can type the package you want to install

Fig. 10: Install package in library pane

ASSIGNMENT
For the SCAT analysis, we need the following packages:
DT

  • Htmlwidgets
  • Tidyverse
  • Janitor
  • kableExtra
  • plotly

Install these packages using one of the two methods explained above.

4.3 What is Markdown?

Running the code and analyzing the data is one thing. Equally important is to report your results in a comprehensive way. For this, we use Markdown.

Markdown is a text-to-HTML conversion tool for web writers. It can also be used to convert text to PDF, ePub or other standards. In addition, Markdown easily integrate the output from the R-code in text-documents. A variety of programs understand Markdown and can convert it to readable text. Markdown is further explained on this website, with a video.

ASSIGNMENT

To use Markdown within RStudio, we need the rmarkdown package.

Install the package using one of the two methods explained in the previous section.

NOTE. If you are using a Mac, you might need to install additional (open source) software on your computer called LaTeX. The (La)TeX distributions can be found here

4.4 Set up the work environment

We are almost there.

ASSIGNMENT

The next step is to unpack the file SCAT_R.zip on your computer.

Ready? Then you should see the folders and files as on the screenshot in figure 11.

Fig. 11: Unzipped files and folders.

A quick explanation of the different files and folders.

  • corp-styles.css: an acronym for Cascading Style Sheet. Used to format the contents of an associated Web page, such as type of Font, back-ground color, etc.
  • input folder: in here you need to save you SCAT Excel files. As an example, three SCAT Excel files can be used as dummies. Do not forget to remove these dummy Excel-files from the import folder before you extract your own Excel-files. IMPORTANT: when you add your own Excel files, please use the same format for the name of the file; “P81 BIOSEC_SCAT_{insert name of lab or city}.xlsm”.
  • output folder: this folder contains the result from running the code. Most important is the file scat_complete.csv. This file contains your aggregrated data collected from your multiple SCAT Excel files (figure 12).

Fig. 12: Files in output folder.

  • SCAT_import.html: this is the output Markdown file with all your results. It can be sent as a report as an attachment of your email. IMPORTANT: all data from the aggregated Excel files is hidden within the html code.
  • SCAT_import.RmD: This is the main file with your R code and Markdown text. This is the code you need to open with RStudio.
  • SCAT.Rproj: Not relevant. Opening this file with RStudio will create a Project environment; it contains project information used to customize the behavior of RStudio, such as loading RStudio settings and restoring previously edited source documents into editor tabs.
  • source folder: contains a logo for the Markdown report and an empty bibliography file.

4.5 Run the R-Markdown file

Double click the file SCAT_import.RmD. Or open the file within RStudio (File > Open File…) In the Source pane you can see the content of the document. It starts with a Document title, and some other Markdown and HTML settings (figure 13).

Fig. 13: Header of a Markdown document.

From ‘# Introduction’ onward, is text in Markdown format (figure 14).

Fig. 14: Markdown text.

The actual R-code, however, is written in a so-called Chunk. Everything between these hooks ``` is considered as Code, denoted by the orange arrows in figure 15.

Fig. 15: A chunk of code.

To run all the code in the document, click the Run button on the upper-right corner of the Source editor, and select Run All.

Fig. 16: A chunk of code.

It is also possible to run code step-by-step by clicking the little green buttons on the upper-right corner of a code-chunk. One is to run all chunks above the current one. The green little arrow on the right is to run the current chunk. It might not work if you haven’t run the previous code first.
Fig. 17: Run a chunk of code.

Finally, to create a nice-looking Markdown report, like the ‘SCAT_import.html’ file, you need to ‘knit’ all code and text together. Click the ‘Knit’ button on top of the Source editor and select Knit to HTML. It might take a moment for the HTML report to be created

Fig. 18: Run a chunk of code.

5 Glossary

Chunk

A code Chunk refers to pieces of code embedded within documentation. Chunk delimiters are ```{r} and ``` .

Console

A console is a window that works as an interpreter. In the console, the code is translated into actions, such as read a file, calculate the Chi-square, etc.

CRAN

Stands for Comprehensive R Archive Network. It is a network of ftp (File Transfer Protocol) and web servers around the world that store identical, up-to-date, versions of code and documentation for R.

Knitr

knitr is an R package that integrates computing and reporting. By incorporating code into text documents, the analysis, results and discussion are all in one place. Files can then be processed into a diverse array of document formats, including the important ones for collaborative science: PDFs, Word documents, slide presentations, and web pages.

LaTeX

LaTeX is a software system for document preparation, see the corresponding Wikipedia page for an overview. LaTeX it is very useful for professional typesetting, especially of scientific documents, and is the most widely-used markup language for mathematical notation.

Library

Library is a directory where the packages are stored. You can have multiple libraries on your hard drive. To see which libraries are available (which paths are searched for packages): libPaths() .

Markdown

descirption of bar

Package

Package extends basic R functionality and standardizes the distribution of code. For example, a package can contain a set of functions relating to a specific topic or tasks. To see which packages are in your library: lapply(.libPaths(), dir) .

RStudio

RStudio is a powerful and easy way to interact with R programming, considered as Integrated Development Environment (IDE) that provides a one-stop solution for all the statistical computing and graphics. You must have R software installed to install, run, and use RStudio on your desktop.

SCAT

Surge Capacity Assessment Tool (SCAT) created in Excel for the assessment of a (molecular) diagnostic laboratory outbreak preparedness.

6 FAQ

Possible error codes.

Line 97 Error in 'mutate()':  
! Problem while computing '25h Surge Calculator = map(value,

You may have opened one or more an Excel files.
Solution: close all Excel files before running code.

Export file is not being saved.

Possibly, RStudio is being blocked by computer settings or antivirus software to safe a file.

Solution: change settings / allow RStudio to safe files.